en 



(N 



a 



I 



o 
o 



Topology trivialization and large deviations for the 
minimum in the simplest random optimization. 



^ Yan V Fyodorov 

Vh School of Mathematical Sciences, Queen Mary University of London 

1^ London El 4NS, United Kingdom 



Pierre Le Doussal 

CNRS-Laboratoire de Physique Theorique de I'Ecole Normale Superieure 



P^ 24 rue Lhomond, 75231 Paris Cedex-France[|] 

_^ Abstract. Finding the global minimum of a cost function given by the sum of a 

'^ quadratic and a linear form in N real variables over {N — 1)— dimensional sphere is 

one of the simplest, yet paradigmatic problems in Optimization Theory known as the 

"trust region subproblem" or "constraint least square problem". When both terms in 

the cost function are random this amounts to studying the ground state energy of the 

'^ simplest spherical spin glass in a random magnetic field. We first identify and study 

^ two distinct large- A^ scaling regimes in which the linear term (magnetic field) leads 

to a gradual topology trivialization, i.e. reduction in the total number Mtot of critical 
(stationary) points in the cost function landscape. In the first regime Aftot remains 
^__l of the order N and the cost function (energy) has generically two almost degenerate 

^ minima with the Tracy- Widom (TW) statistics. In the second regime the number of 

\l critical points is of the order of unity with a finite probability for a single minimum. 

Jii^ In that case the mean total number of extrema (minima and maxima) of the cost 

f^ function is given by the Laplace transform of the TW density, and the distribution of 

._k the global minimum energy is expected to take a universal scaling form generalizing 

f^ the TW law. Though the full form of that distribution is not yet known to us, one of 

Cn its far tails can be inferred from the large deviation theory for the global minimum. In 

. . the rest of the paper we show how to use the replica method to obtain the probability 

_ ^ density of the minimum energy in the large-deviation approximation by finding both 

K^ the rate function and the leading pre-exponential factor. 
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1. Introduction 

The problem of minimizing the quadratic over the sphere 

E^in{h) = min {^/.(x)} , Eh{^) = --x^i/x - h^x, h, x G M^v, (1) 

|x|=/? Z 

plays important role in the Optimization Theory as it naturally arises at every step of 
iteration in a popular class of nonlinear optimization algorithms called "trust region 
methods" [1]. In a different incarnation it is known as the simplest representative 
of "constraint least square problems" [21 Ej- For these reasons a lot of effort was 
devoted to developing effective numerical algorithms for its solution, especially for large 
dimensions, see [U |5] and references therein. For h = the problem is equivalent to 
finding the maximal eigenvalue of the N x N real symmetric matrix H and in this sense 
straightforward both conceptually and numerically. The case h 7^ is equivalent to a 
certain "quadratic eigenvalue problem" [2] whose solution can be written in terms of 
the roots of the equation involving the resolvent of H, see [6] and equations ^ and 
([3]) below, which makes investigating the properties of the minimum considerably more 
challenging. From the point of view of Statistical Mechanics the cost function -Eh(x) has 
a natural interpretation of the energy associated with a configuration x^ = (xi, . . . , x^) 
of N spin variables Xi, with H standing for the spin interaction matrix and h for 
the magnetic field. In that context the constraint |x| = -y/iV defines the so-called 
spherical spin model. Further assuming if to be a random N x N matrix from the 
Gaussian Orthogonal Ensemble (GOE) defines the simplest spherical spin-glass model 
introduced and studied for A^ ^ 1 long ago by Kosterlitz, Thouless and Jones [^, and 
by many authors ever since, see e.g. chap. 4 of the book [8]. The statistics of the 
global energy minimum (the ground state) of such a spin glass for h = is trivially 
related to the properties of the maximal eigenvalues of GOE matrices. The latter is 
by now well-studied in the random matrix theory (RMT) and given by the famous 
Tracy- Widom law jH] in the small-deviation regime, and by well known large-deviation 
functionals beyond that regime [101 [HI [El [131 [H]. We also note that there exists close 
and fruitful relation between RMT large deviations functionals, spherical spin glasses, 
and the problem of counting minima and saddle points of large-dimensional disordered 
surfaces, see [151 [HI [13 [HI [13 120] and references therein, and the section 3 of the 
present paper. 

Although thermodynamics of the model is simple and does not show such 
prominent features as replica-symmetry breaking, dynamics for h = is rich and has 
features of aging [211 [221 [IH]- That richness is attributed to a relatively rich energy 
landscape topology due to presence of 2N stationary points in the landscape. It 
was further noticed by Cugliandolo and Dean in [23] that taking an arbitrary small 
iV— independent magnetic field h 7^ trivializes the topology in the N ^ 00 limit 
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by allowing only two stationary points to survive, the maximum and the minimum 
(see sections 2 and 3 below for a detailed discussion). Such an abrupt restructuring 
of the landscape indeed was shown to result in washing out the aging effects for any 
finite value of the magnetic field |23] . The first main goal of our paper is to provide a 
detailed, quantitative picture of the topology trivialization for large but finite iV 3> 1. 
Namely, we will identify and study in some detail the existence of two nontrivial scaling 
regimes: |h| ~ N~^f'^ and |h| ~ A^"^''^. In the former the topology is still complex in 
the sense of existence of the order of A^ stationary points. In the latter the number of 
stationary points is finite, gradually decreases with growing field and tends to just two, 
a minimum and a maximum, when |h|A^^/^ ^ 1. 

Having understood in some detail the picture of gradual landscape topology 
trivialization we then address the question of statistics of the global energy minimum 
in the presence of a nonzero random magnetic field. The question is not trivial and, to 
the best of our knowledge, has not been much studied. The difficulty is that for h 7^ 
the relation to properties of random matrices is less direct, see the next section for a 
discussion, and the powerful RMT tools do not seem to be of obvious utility. 

To that end, a simple perturbation theory insights suggest that in the first scaling 
regime |h| ~ N~^^'^ the magnetic field is too small to modify the Tracy- Widom statistics 
of the global minimum. In contrast, fields of the order |h| ~ N~^^^ do modify statistics 
of the extrema, and we expect the distribution of the minimum in that scaling regime to 
be given by a family of universal laws generalizing the TW distribution and containing 
the latter as a limiting case. Though finding the explicit description of the family 
remains an outstanding challenge one can get some insights from the side of large 
deviations pertinent to the case of fields \h\ ~ 0(1). From that angle the second 
main goal of this paper is to show that the explicit form of the probability density for 
the minimum can still be found in the large-deviation regime in some range around 
its typical value. This can be done in the framework of the replica trick which we 
will use in two alternative ways. Following the first way one extracts the Legendre 
transform of the large deviation rate function from analysing the n— dependence of 
the moments of the partition function. This is very close to the method of Parisi & 
Rizzo[211 ESI [20] employed in their recent studies of large deviations of free energy of 
the Sherrington-Kirkpatrick model, though we concentrate for our case on the zero- 
temperature limit and aim to derive the full large-deviation rate function rather than 
its perturbative expansion. That expression in the limit of vanishing field successfully 
reproduces the known RMT large-deviation results. We also show that the method 
is capable of producing the leading pre-exponential factor by taking into account the 
Gaussian fluctuations around the saddle-point solution with the help of de-Almeida- 
Thouless[28]-inspired fluctuation determinant analysis. In the limit h — )■ this factor 
is found to reproduce correctly the structure of the known RMT pre-factors, up to a 
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global factor of 2 (accordingly, the naive zero field limit of our expression is exactly 
twice the asymptotics of the Tracy- Widom distribution in the small deviations regime). 
We will discuss a possible scenario behind such a mismatch. Thus though our large- 
deviation calculations provide a hint that the TW distribution may be tackled using 
replica, recovering the full expression remains a considerable challenge. The calculation 
also allows us to predict the form of one of the tails of the (presumably universal) 
distribution for the energy minimum at magnetic fields |/i| ~ A^^^^^. 

Finally, in the last section of the paper we suggest an alternative method allowing 
us to arrive to the same large-deviation rate function by directly addressing the 
probability density for the ground state in the replica limit n — )■ 0. It seems to be 
new to the best of our knowledge. We hope that the method, after due modification, 
may prove to be useful for studying more complicated optimization problems, such as 
large deviation functionals of the ground states in systems which show broken replica 
symmetry like more general spherical spin glasses [27] or related disordered models 

[2SlEniEIlE2lE2llM]. 

The paper has the following structure. We begin with outlining the exact formal 
solution for the minimization problem ([I]) in terms of the resolvent of the matrix H 
and briefly discuss how the position of the typical minimum can be inferred from a 
simple RMT consideration. We also use perturbation theory to relate the statistics of 
the minimum for very small magnetic fields to some interesting objects in the random 
matrix theory and further identify two nontrivial scaling regimes for the magnetic field 
related to the gradual topology trivialization. In essence, those two regimes stem from 
the existence of the RMT "bulk" and "edge" spectral regimes. Then we provide the 
explicit calculation of the mean number of critical points in the first scaling regime, show 
that under such a scaling that number is of the order of N and becomes of the order of 
unity when approaching the second scaling regime. The same calculation is extended to 
the second scaling regime, where also relate the mean number of minima to the Tracy- 
Widom density. In the rest of the paper we describe two versions of the replica trick 
used to derive the large-deviation rate for the minimal energy in two alternative ways, 
and also show how to take into account the fluctuation determinant contribution to 
find the leading pre-exponential factor. Finally, in the conclusion section we formulate 
a few open problems stemming from our research. 

2. Lagrange multiplier minimization. Relation to RMT in perturbative 
and small deviation regimes. 

We begin with outlining the exact formal solution for the minimization problem ([I| 
given originally in [6]. Applying the Lagrange multiplier method to (fTl) by adding to 
the cost function the term t(x-^x — R^) and minimizing yields in the standard way 
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the argmin x* of the cost function as x,,, = {t^ — H)^^\i where the muhipher t* is the 
maximal solution of the following secular equation: 

«' = ''"(rZl?)5h = Eir^. -V = (h"e.) (ejh) (2) 

where e^ are the orthonormal eigenvectors of H and \j denote the corresponding real 
eigenvalues. 

For a generic situation the vector h is not parallel to one of the eigenvectors and 
one can show that t* > maXj{Aj} [6]. The minimal value of the cost function is then 
given by 

^.„.„(/.) ^ -\ {rH. + h-j^h) ^ -1 (rH. h- t T^) (3) 

In the remainder of this paper we consider N x N matrices H G GOE distributed 
according to the weight V{H) oc exp— ^Trif^, i.e entries Hij = Hji are independent 
mean zero Gaussian real variables with variances < Hf- >= J'^/N for i < j, and 
< Hfj^ >= 2J'^/N. We treat also the components hi of the field h as independent, 
identically distributed random Gaussian variables with zero mean and the variance 
(hf) = 0"^, and use the spherical model constraint R^ = N. To this end we 
would like to note that had we replaced the random field term h"^x in (fTl) with 
the random anisotropy term (h^x)^ the resulting energy function could be written 
Eh^x) = — ^x^ (if + 2h ® h"^) X. The minimization problem would then amount to 
studying the maximal eigenvalue of a rank-one random perturbation of GOE, which 
attracted a considerable interest recently, see e.g. [351 ES, ETl [38] and whose large 
deviation functional is known explicitly |39]. In contrast, the problem with magnetic 
field is not a simple eigenvalue problem but is equivalent to a much less studied 
class of quadratic eigenvalue problems p]. In particular, it is straightforward to 
show that the secular equation ^ for the Lagrange multipliers t can be rewritten 
as det [(t — Hy — h ® h^] = 0. For such problem it is less evident how to employ 
standard RMT methods of large-deviation analysis, such as the powerful Coulomb gas 
method [III [12]. 

What is simple to understand is why generically for |h| ~ cr of order of unity 
and large A^ ^ 1 the secular equation ^ should have only two solutions, as well 
as to find the typical values of t and the minimum energy in that case [23]. First 
we recall that the typical spectrum of GOE matrices in the chosen normalisation is 
located in the interval (—2 J, 2 J). This implies that a typical separation A between 
neighbouring eigenvalues in that interval is of the order of A ~ JN~^. We immediately 
see that for any t G (—2 J, 2 J) the right-hand side in (|2| is typically of the order of 
(x2/\-2 r^ (^^1 j^jsi'^^ whereas the left hand side is E? = N. Therefore only for small 
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magnetic fields of tlie order a ~ JN~^/'^ sucli equation may liave its solution t in 
the interval t e (—2 J, 2 J). In the next section we will find the mean number Mtot of 
solution in such a regime as a function of parameter 7 ~ Na"^ / J"^ = 0(1). We will 
find that Mtot is proportional to A^ and gradually decreases with growth of 7 reflecting 
the phenomenon of topology trivialization. When the magnetic field reaches the scale 
a/ J ~ N^^^^ the mean number of solutions in the interval t E (—2 J, 2 J) is of the order 
of unity, and eventually, for a/ J = 0(1) there will be typically only two solutions, 
both outside that interval, with a single solution t = t^, > 2J corresponding to the 
energy minimum, and similarly another one with t < —2J corresponding to the energy 
maximum. 

To find the typical values of the Lagrange multiplier t* and of the minimum energy 
Emin{h) for a = 0(1) one may then take into account that that linear and quadratic 
terms in the cost function are not correlated and argue that the typical value of t* can 
be obtained by replacing ^ with its ensemble averaged version: 

'^"'Ci^^''' MA)^^V4J^^ (4) 

Here we used that the profile of the mean eigenvalue density in the interval (—2 J, 2 J) 
is given by the semicircular law pscW = limTv^oo jf (XI,- ^(^ ~ \)/ > with Xj being 
the N eigenvalues of H and brackets standing for the ensemble averaging. As a typical 
maximal Lagrange multiplier t^, > XmlJ = 2J, the integral in the right-hand side can 

be shown to be equal to ^ I , ^* ^ ^] ■ Solving then the resulting equation and 
applying similar treatment to (tsl) one finds after simple manipulations [23J: 

t^:''^ = 44=r2^ Eini:\-) = -nv^^tt^ (s) 

We will see later on in the paper that E^^^ (a) is indeed both the typical and the 
average value of the ground state energy of the spherical spin glass as given by the 
replica trick. 

Note that for h = to each solution t = Aj of the stationarity equation ^ 
corresponds exactly two different critical points of the cost function landscape with 
the same energy as changing x — t- — x does not change the cost function ([I|. Thus 
we must have altogether 2A^ critical points. As for vanishing field we must have 
t* = max{Ai, . . . , Xn} = Xmax it is reasonable to try to study the case of very weak 
fields by developing a perturbation theory around Xmax- 

It can be done most conveniently by introducing a small parameter, the typical 
scale a of the field, via formally defining Wj = a'^Wj, where now Wj are considered 
to be of the order unity, and looking for the solution t^, as a series in powers of a. 
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The straightforward manipulations yield for the first two nonvanishing terms of the 
expansion the following expression: 



<.=A_ + a^- + -^-^ +... (6) 

where the sum goes over all j = 1, . . . A^ excluding the terms with Aj = \max-, and we 
denoted a'^Wm = (h^e^) (e^h) where e^ stands for the eigenvector corresponding to 
the maximal eigenvalue. Further substituting ([6]) to (tsl) yields a perturbative expansion 
for the minimal energy in the form 







Wj 



— \ \'^ 

ax ^j ) 

Using this expression one can try, in principle, to study statistics of the perturbed 
ground state by relating it to properties of random matrices. For example, to the first 
order in a the ground state is equal to the sum of two independent variables since 
eigenvalues and eigenvectors of the random matrix are independent of each other. As is 
well-known, in the large- A^ limit Xmax = 2 J(l -|- ^CN~'^^^), with random ( following the 
(3 = 1 Tracy-Widom distribution ^. On the other hand, it is easy to see that Wj = w 
are all distributed with the probability density V{w) = J — e~^^'^. The ground state 
distribution is then the simple convolution of the two. Much less trivial are terms of the 
order o"^ and higher in the series O . In the language of the random matrix theory the 
second-order term can be interpreted as the so-called "level curvature" associated with 
the largest eigenvalue. To find the distribution of this particular type of level curvature 
is a rather challenging RMT problem not yet solved (see a detailed discussion and 
description of the problematic for GUE matrices in [10]), though in the bulk of the 
spectrum related objects for GOE were successfully investigated long ago jHl WI\ . 

One also can use the perturbation expansion ([T]) to estimate the scale a of the 

magnetic field at which all terms in the series for SSm = ""'" nT" ~ become 
typically of the same order. Using that for N ^ 1 the typical eigenvalue separation 
between the Xmax and the second largest eigenvalue is of the order A ~ JN^"^^'^ we see 
that the scale in question is given by a ~ yN/S. ~ JN^^^^. For such values of the 
magnetic field we then have S£m ~ A/J ~ N^"^^^. It is natural to expect that in such 
a regime the probability density of the scaled random variable C = S^m^^^^ will be 
given by a universal family of distributions shared by minimization problems M for a 
broad class of random matrix ensembles and of the magnetic field distribution. The 
family is parametrized by the scaling variable k = N^^^a'^/J'^ and is a very natural 
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generalization of the Tracy- Widom law (and contains the latter as a limiting case at 
K = 0). To understand it properties is yet another challenging open problem. In the 
section 4 we will be able to understand far tail of such a distribution from matching to 
the large deviation result. 

Very similarly one can develop perturbation theory for small a around any solution 
tj = Xj of the secular equation (2) for a = 0, with Xj in the bulk of the spectrum 
(— 2J, 2J). It will be of the same type as (6][7), but with Xmax replaced by Xj. In fact 



,(0) 



(±) 



around each t, we will have two perturbative solutions tj different by the sign in 
front of the perturbative terms. Using that the typical eigenvalue separation A ~ JN~^ 
in the bulk of spectrum one can estimate that all the terms of the perturbation theory 
are of the same order for a ~ JA^~^/^. This is the same scaling as anticipated for the 
regime of gradual trivialization of the landscape topology. We are going to study the 
phenomenon of topology trivialization quantitatively in the next section. 

3. Two-stage trivialization of the cost function landscape topology: 
quantitative considerations. 

Let us denote the mean of the total number of all stationary points for a random field 
on a manifold as Aftot- General framework for calculating that number for stationary 
Gaussian fields was developed in y^ for unconstrained case and extended in [191 120] to 
the case of spherically constrained isotropic fields pertinent to our problem. As the most 
convenient expressions for the mean total number of points with a given index in the 
spherically constrained case, see (tol) and (10) below, were not written down explicitly 



in [ini [20] we give below a brief derivation using equation (5.2) of [20] as the starting 
point. It concerns the mean number E{C]y (i?)} of critical (stationary) points with a 
given index (i.e. the number of positive eigenvalues of the Hessian) fc = 0, 1, 2, . . . A^ — 1 
such that the values of the cost function -Eh(x) restricted to the sphere |x| = yiV at 
those critical points lie in a Borel set B E M.. That object was shown to be given for 
all N by: 

E{Ci^\B)} = C{N, u', u) J Egoe |e?(^^+i-^')e-'^('''+^-^)' | dy (8) 

where the expectation in the right-hand side goes over the random variable A^+i which 
is distributed as the k + 1-th lowest eigenvalue of the standard GOE random matrices 
with the variance chosen to satisfy J^ = 1/2. In the above formula u' = ■^u(y)\y=i, u" = 
-^i^iy)\y=i and it is valid for a centered isotropic Gaussian field on the sphere with a 
covariance function u(y) defined by the identity: E {Eh{xi) Eh{x2)} = Nu [j^xjx2). 
We denoted a^ = u" + z/' — z/'^. The factor C{N, u', u) is given explicitly by 
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C{N, u', u) = 2 ( ^^/^ ) -7=j7 (77) § To count the totality of all stationary points 
with a given index k irrespective of the values taken by the cost functions we set B 
to conside with the real line M. We then can perform the (Gaussian) integral over the 
variable y and get 

E{C<'>(M)} = 2 {^)"'' i^f^^oos {ei^^»} (9) 

Finally we can sum over all index values k and exploit the identity Ylk -^(-^fe) = 
N f F(A)p(A) dX where p(A) = ^ X^fc^o '^(•^ ~ -^fc+i) is the exact eigenvalue density of 
GOE. As the result we arrive at the following compact expression of general validity 
for the total number jVtot of stationary points in the spherical model: 

^JT^^,) (^j / ^GOE{pN{\)}e^"''^'^' d\ (10) 
In the so-called "pure" case v{y) = y^, of a p-spin model, considered in jT^ we have 



v' = p, u" = p{p — 1) and the eq.(lO) reproduces eq.(2.9) of that paper 



It is easy to see that the cost function ([I]) corresponds to the choice //(?/) = 
^y'^ + a'^y which yields z/' = J^ + cr^, u" = J^ and eq. (10) assumes the form || 

/9f 7'2 I ^2\\ 1/2 / t2 \ ^/2 ^oo ^2 

■^"-'"[-hrJ) (j^T^j Jjoo.iP.me^^'^ dx (11) 

The above expression is exact for any A^, and one can provide also the exact expression 
for the mean eigenvalue density ^goe{pnW} in terms of the Hermite polynomials, see 
e.g. [13] or eqs. (3. 12)- (3. 13) in y^. As such it can be hopefully useful for comparison 
with the results of direct numerical simulations of the spherical model landscape, see 
|33] for a recent work of that kind. We however are interested mainly in the limit 
A^ — > oo where according to the earlier discussion we expect a nontrivial behaviour 
to occur at the scale a ~ N^^^'^J. Indeed, introducing the parameter 1 = N-^ and 
performing the limit A^ — )■ oo for a fixed finite 7 we arrive at the following expression: 

lim:^=Ar(7) = e-/^v/23A2ei^^^, 7 = iV^ (12) 

where we have used that the limiting eigenvalue density has the semicircular profile 
(Uh in the interval (— v^? V^)- One can further simplify this expression by introducing 
A = \/2cos9,9 G [0, tt] and noticing that the resulting integral can be related to the 
Bessel function of imaginary argument Io{z). This yields finally the expression 

A^W = -2^(e-i/„(|)) (13) 

§ Note that the value of C{N, i/', v) given in eq.(5.2) of [20j misses the factor ^=7- 

II Note that though the treatment of [20; was formally restricted to covariances of the form v{y) — 

y^ + . . ., one can check that inclusion of the linear term in the expansion does not invalidate their 

formalism. 
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with the small-7 expansion A/'(7) = 1 — ^ + ^7^ + 0(7'^). In particular, A/'(7 = 0) = 1 
and monotonically decreases with growing 7, being of the order of unity for any finite 
7 < 00. This function is plotted in Fig. [TJ 

We conclude that for any a/ J ~ N^^^"^ the total number of stationary points is 
asymptotically of the order of A^ and therefore one may expect nontrivial aging effects to 
take place. Let us mention that it is natural to expect that the mechanism of reduction 
of the number of real solutions of the secular equation ([2]) is by pairwise collisions of 
the real roots as a function of the growing magnetic field and disappearance of the pair 
into the complex plane. The last removed are to be stationary points corresponding to 
the Legendre multipliers t with values close to the spectral edges ±2J. Analytical and 
numerical understanding of that picture, as well as investigating statistics of solutions 
of the equation ^ in the crossover regime, and statistics of the cost function values 
(energies) at critical points at finite 7 seem to us as interesting open problems deserving 
further attention. 




r 



Figure 1. Mean number of stationary points (divided by 2N) as a function of 
7 — N G^ I {23"^) in the first scaling regime a j J = 0{N^^/^), from the formula (12 1. 



The formula ( 15 ) can be further used to infer the existence of yet another relevant 



magnetic field scale a such that the process of landscape trivialisation enters its final 

stage. This happens when Mtot drops to the values of order of unity. Exploiting the 

-3/2 



asymptotic Io{z 3> 1) ~ e^ /\/27iz we obtain J\f{'y ^^ 1) 



v^ 



7 



Second stage then 



corresponds to A^(7) of the order of 1/A^ which occurs at 7 ~ N'^^^, that is a/ J ~ A^ ^^^. 
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From our previous consideration we have seen that this was precisely the scale when 
the magnetic field term started to affect the statistics of the global energy minimum, 
with the scaling parameter now being k = 2'~fN~'^^^ = N^^^a'^/J'^. 
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Mean number of stationary points as a function of k = N^^^a^/,P in 
0{N~^^^), from the formula (15). The asymptotic 

oo the 



Figure 2 

the second scahng regime cr/J 



formula for small k, Eq. (171 is also indicated as the lower curve. For k 



mean number converges to the minimal possible value 2 (see the text). 



In fact, it is easy to understand that if we like to know precise number of 



stationary points in that new scaling regime the expression (12) should be replaced 



with a more accurate formula. This can be most easily seen by the fact that (12) 
in such a regime is dominated by the vicinities of the spectral edges A = ±v2 of 
the widths |A ± \/2\ ~ A^~2/3 -^j^gj^g f;]2g semicircular law should be replaced by a 
more accurate expression. Using the symmetry ^goe{pnW} = ^goe{pn{—^)} we 
can restrict integration in (10) to A G [0, oo) multiplying the result by the factor of 
two, and so it is enough to consider the scaling vicinity of only one edge A = a/2. 
Introducing A = a/2 (l + ^^) one finds that Egoe{pnW ~ N~^^^y/2pedge{0 where 
explicit expression for PedgeiC) is given by|l5] 



Pe.,e(C) = [^^'(C)]'-C[^^(C)]'+^^^(C) 



Ai(r]) di] 



C 



(14) 



where Ai{() = 
At"{0 - (AziO 



1 r ^ 

= 0. 



-< 



is the Airy function solving the differential equation 
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Performing the corresponding limit A^ — )■ oo in (11) keeping k = N^/^a'^/J'^ finite 
we get the exact expression for the hmiting number of critical points in this regime as 

/oo 
e^^PedgeiOdC, «: = iV^/V/J^ (15) 
-oo 



We see that it always remains of the order of unity. This function is plotted in Fig. 
[2] We can easily extract the values for Mtot{i^) for k ^ 1 and k ^ 1 using the known 
asymptotic behaviour: 

PedaeiC ^ -oo) ^ ^, Ped,e(C ^ +Oo) ^ \Ai{C) ^ J^^.^W {-^C'^'} (16) 



The small-K behaviour oi Mtoti.!^) is obviously controlled by C — ^ — oo asymptotics, and 
we have: 

M.('.«l)«4/%K^dC^^»l (17) 

which precisely matches the A/'(7 ^ 1) oc 7^^'^ behaviour obtained by us earlier. On 
the other hand, the behaviour Aftot{i^ ^ 1) is controlled by C — ^ 00 asymptotic: 

A/tot(/t>l)^ — ■^- / e 3^ +2^ c?C = ^ / e 



where we have made a substitution C = m k^ to make it evident that the integral in 
the limit n ^ 1 can be evaluated by the Laplace method around the stationary point 



u = 1/4. Equivalently we can use (16) and the identity 



+00 3 

dCAi{C)e^^ = e^ (19) 

■00 

for any k, > 0. The straightforward calculation then yields limK-5.00 A/iot('« ^ 1) = 2. 
This is the minimal possible value implying the existence of a single minimum and 
single maximum only. 

In fact not only the mean total number of all critical points of the cost functional, 
but the mean number of true extrema (minima or maxima) can be found in explicit 
form, and in the scaling regime a/J ~ N~^^^ is very directly related to the famous 
Tracy- Widom distribution [U]. Indeed, minima correspond to the index k = 0, and 
their mean number is accounted by K{Cj^~ (M)} from (9) so is related to the statistics 
of the minimum eigenvalue Ai (cf. [H]). Introducing now the random variable ( by 
Ai = Xmin = — "\/2 (1 + ^^2/3 ) and performing the limit A^ — > cxd in (9) keeping k finite 
we express the mean number of minima (or maxima) as: 

/oo 
e^^F'{C)dC (20) 

-00 
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dFi 
d( 



where F'(C) = ^ and 



Fi(C) = Prob !^\ma. < v^ (l + ^) } (21) 

is the Tracy- Widom distribution [^. By definition A/'m(ft; — ?■ 0) = 2[Fi(oo) — Fi(— oo)] = 
2. Near k = one finds Mm{i^) = 2 — 1.20652/t; + 0{n'^). For large k the integral is 
controlled by the tail of the Tracy- Widom distribution, which takes the form: 



Exploiting (18) or equivalently (19) we find that limR^i A/'m.(/«) = 1- The function 
is plotted in Fig. [3j 

We thus see that in the second ("edge") scaling region a ~ N"^^^ the growing 
magnetic field gradually reduces the mean number of minima from two to just a single 
minimum. It is also easy to check that the mean number of minima always remains 
equal to two in the first ("bulk") scaling limit a ~ N~^^'^, and we have already seen 
it is equal to one for any field of the order of unity. This corresponds to the following 
picture: initially at zero field among 2N critical points of the cost function there existed 
two global minima with exactly equal energies Emm = —-^NXmax whose position vectors 
were related by the refiection x — ;■ — x. Any nonzero magnetic field forces those two 
critical points to have slightly different energies but as long as the magnitude a satisfies 
a/ J -C iV~^/^ both of them with probability tending to unity retain their identity as 
minima. Only when a/ J ~ N"^^^ the highest of the two minima has a nonvanishing 
probability to be converted to a saddle-point with nonzero index, the probability being 
higher the bigger is the value oi k = N^^^a'^/J'^. Finally, for k — )■ oo (and in particular, 
for a/ J ~ 1) the probability of having only single minimum in the energy landscape 
tends to unity when A^ — )■ oo. 

4. Replica trick I: extracting the large deviations rate and pre-exponential 
factors from partition function moments. 

To employ the replica method for our minimization problem we treat it as a problem of 
Statistical Mechanics, see e.g. [16]. Allowing for the temperature T > we start with 
introducing the partition function associated with the model 

N 

Z{P) = / €''^^'^■^''^5 (x^ X - A^) rfx, dx = Yl dxi, /3 = T-^ (23) 

^ i=i 

and consider the integer moments (2"(/3)). The Gaussian nature of Eh{^) allows 

us to perform the ensemble average easily. In particular, rewriting X]a=i ^H^a = 

Tr \H X]a=i ^a ® ^a ] allows to perform the averaging over H E GOB by using the 
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Figure 3. Mean number of extrema as a function of k = N^l^a^ j J"^ in the second 
regime a j J — 0(-/V^^/^), from the formula (20). The number varies from 2 at k = 

to 1 for K — >■ oo 



identity (exp — Tr [iJA]) = exp ^^Tr (A + A^) valid for any matrix A. Similarly, 
(exp/3h-^^"^j^Xa) = exp ( ^- X]^ b ■'^a ^^ ) • '^^^ specific rotational invariance of the 
integrand after the averaging is performed allows then at the next step to follow the 
method of [29], see eqs. (lO)-(ll) of that paper. To that end one introduces the 
n X n positive semi-definite real symmetric matrix Q of scalar products with entries 
qab = (xjxft) and uses qa<b as n(n + 1)/2 new integration variables. Changing after that 
the scale Q — )■ NQ we get in the standard way 

Jq>o 



where C 



N,n 



'Q>0 
= CN,nN 

j\TnN/2 _TT_ 



a=l 



ni 



e^*"(«) det g(-'^-i)/2 -Q ^ ^^^^ _ ^^ ^g (24) 

Q>0 „=i 

and we assumed N > n + 1. In the large- A^ limit the 



. 2 , 



form of the integrand is suggestive of the saddle-point method with the functional to 
be extremized given by: 



/q2 t2 o2 2 " 1 

^n{Q) = ^Tr(g2) + ^^g,, + -TrlnQ 

a,b 



(25) 
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so that the stationarity conditions are 

^ <l>„(Q) = /3V2g,5 + /3V+(g-i)^^ = 0, Va<6 (26) 



dqab 

Looking for the relevant saddle-point to be replica-symmetric: g^a = 1, qa<b = 1, Va < 
b we find that the inverse Q~^ has a similar structure with diagonal/off-diagonal entries 
given by 

Pd = {Q ')aa = (l_g)(l + g(n-l)' ^ ^ ^^ '^ -^^ ^ " (1 - g) (1 + g(n - 1) ^'^^^ 

It is also easy to show that 

detQ = (l + g(n-l))(l-g)"-i (28) 



Substituting the latter formula to (26) we can bring it to the form 



{rq + a'){l - g) (1 + q{n - 1)) -T'q = (29) 

Here we study this equation analytically continued for n = and n near zero. Then 
there are generically three roots. Excluding the solution with g > 1 leaves two roots, 
e.g. for cr = these are g = 0, 1 — T. For T < 1 the root g = 1 — T is the physical 
solution corresponding to a (replica-symmetric) spin glass phase, which is essentially a 
"disguised ferromagnetic" |8]. Fr T > 1 it is g = (paramagnetic phase). In presence 
of a random field a > 0, the case studied here, the transition at T = 1 disappears: one 
root lies in the interval < g < 1 and is the physically relevant one, while the second 
root has g < and should not be considered. Here, in addition we will be interested in 
the optimization problem, i.e. the zero T limit. 

One has then (Z^) ~ e*"^'^^'"''' where the functional at the saddle point takes the 
value : 

^^ = ^p(l + {n-l)q) + ^-(1 + in- l)g) 

+ 7^(ln(l + g(n - 1)) + (n - 1) ln(l - g)) (30) 

2n 

The standard use of the replica trick is for extracting the ensemble-averaged free 

energy per degree of freedom (/) = —TliinN^ooN"^ {\nZ{(3)) which can be done in 

the replica formalism as 



Af-s>oo,n.-s>0 Nn n-s-0 n 



■^ ^1 - g') + ^ ln(l - g) + ^7-^ + a^^il - q) (31) 



where g is the solution of (29) for n = 0. By definition, the zero-temperature limit of 
the mean free energy should coincide with the mean of the absolute minimum of the 
energy functional per degree of freedom Bmin = Emin{h)/N, that is limr-s.o (/) = {^min)- 
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Moreover, as / is known to be self-averaging the mean and the typical value should 



coincide. Indeed, by solving (29) in the limit T ^ 1 we easily find g = 1 — Tv with 
V = ( J^ + 0"^)"^/^. Substituting this to (31 ) and sending T — >■ gives the finite value 



— lim ( T lini " ' — 



V 



-v^J^ 



a 



2 



T 

of identities: 



T-s>0 Y n-s>0 n J 2 

thus indeed reproducing the value e\^^{a) = E\^^{a)/N from (5). 

One may however observe, cf. [21], that the low-temperature behaviour of the 
moments {Z^) can in fact be used not only for extracting the mean {E^in) (a), but rather 
obtaining the whole large deviation functional of the distribution of the random variable 
Emin{h). We start with assuming that the probability density V{E) oi E = Emin{h) 
takes in the thermodynamic limit A^ ^ 1 a well-defined large-deviation asymptotic 
form 

V{E) ^ R{e) e-^^(^) , e = E/N (32) 

with the rate £(e) and the leading pre-exponential factor R{e). On the other hand we 
will see below that by scaling the replica index n with temperature as n = sT and 
keeping s finite when both T and n tend to zero one can also define two functions g{s) 
and (j){s) from our (analytically continued) moments in the large N limit: 

lim (Z") ^ g(s)e^'^^'^ (33) 

Hence exploiting lnZ(/3) = — ^ and lim/|T-s>o = ^min{h) we can now write the chain 

lim (Z") = lim (e-^^0 = (e"^^^— W) (34) 

„=sr,T-s>0 T->0 \ / \ '' 

g-7V(.e+£(e)) ^^g) ^^ _ g{s)e^*^''^ (35) 

where in the last step we have applied the saddle-point method for evaluating the 
integral over the energy E yielding the following consistency relation between these 
functions: 

0(s) = -min(se + £(e)) (36) 

e 

We therefore see that (f){s) is the Legendre transform of the large-deviation rate function 
£(e). Moreover, the same procedure allows to relate the pre-exponential factor -R(e) 
to g{s) and 0(s). Namely, by recovering V{E) from its Laplace transform with help 
of the Bromwich integral and employing again the saddle-point method we find that 
asymptotically 

Jconst-^oo 2m ^2iV7r|0"(s.) | 

Now we proceed with implementing this program, first for finding the rate function, 
and then for the pre-exponential factors. 
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4.I. Rate function calculation 

We make for small temperatures the Ansatz: n = sT, q = 1 — vT where f > is 
expected to remain finite when T —^ 0. The saddle-point equation ( [29| ) in the limit of 
small < T ^ J^ takes the temperature-independent form (J^ + a'^)v{v + s) — 1 = 
which is solved by 

v = l{-s + V^^+lB^), B^ = j^, (38) 

Similarly, the functional $„(Q) from (25) is transformed by the same low-temperature 

Ansatz to: 

J2 cr^ 1 / s\ 

$(5, v) = -js{2v + s) + —s{s + v) + -\n i^l + -j (39) 



which after substitution of the solution rt39j) yields the Legendre transform 0(s) of the 
rate function in the final form: 



0(s) = ^.^ + ^sv/^^T45^ + ln r + ^^'/^^' 1 (40) 

The large deviation rate function £(e) of the ground state energy can be found by 
£(e) = — es* — 0(s=f) where s* is the solution of — e = 0'(s) = ^ (a^s + -^Vs^ + 45^). 
For any e < Be, where Be is the threshold: 



-A/ 7^^^ (41) 



there are two roots to this equation: 

2 



s-^ = J^J^^^i'-" ± (J" + -') v/^^^) (42) 



which merge at Be- One can check that only the + root satisfies the requirement (36) 
that the extremum is a minimum, hence we retain it. For b > Be there is no solution 
(we consider e < 0). In contrast to the cr = case note that the typical (intensive) 
energy and the threshold are now distinct with e*^^ < Bc- [^ 

Introducing the dimensionless variables S = e/J = E/{NJ) and F = a'^/J'^ and 
denoting by the same letter £(e) = C{S), we find after straightforward manipulations: 



C{S) ^ 



1 + 2F 



1 -I- 2F 



-ini^m -S+,U^^^ 



1 + 2F \ V 1 + r 

(43) 

This explicit formula for the large deviation rate function C{S) is one of the main 
results of the present paper. Let us discuss the behavior of the rate function. It is 

^ Note that s* vanishes at the typical energy and becomes negative for e^yp < e < e^, i.e. that region 
is controUed by negative number of rephca. 
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defined only for £ <£c = -Jt^- Note tliat for £ = -Vl + T = £^^yp^ < £^ the rate 
function vanishes, and that value is simultaneously the minimum of C{£) (see figure [i]). 
This is consistent with the notion of £^^yp'> as the typical value of the ground energy. 



Note also that in the limit of the vanishing magnetic field F — )■ (43) is reduced to 
C{£) = Co{£) := -£V£^^ - In (-£ + V£^^ 



(44) 



which indeed coincides with the large deviation rate function of the £ = —jjXmax, with 
^max being the maximal eigenvalue of GOE matrix fiUl [12]. The present method does 
not say anything about large deviations for £^^yp^ > £^^ but based on the RMT analogue 
[TT| [12] one may conjecture that the rate function should in fact be infinite there, such 
that the probability of the ground state decaying as exp {—N^const) at A^ ^ 1, see also 




Figure 4. Large deviation rate function £(e) as a function of tire (intensive) 



optimal energy e — Emin/N, plotted for F = 2, from Eq. (43). The threshold is 
at Gc = ■\/3/2 = 1.2247 and the typical energy correspond to the minimmn of the 
curve at etyp = \/2 = 1.41421. 



Around the typical value at fixed F > one has the following behaviour for 
£ = £^^yP^ + y: 

».2 (r + 1)3/2^3 



C{£) 



+ 



+ O {y') 



F ' 3F3 

This implies the gaussian tails: 

P{E) ~ e-^^^-^'*^'")'/^ JN'/^ « \E - E^'yP^l « NJ 



(45) 
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It well may be that the distribution is exactly Gaussian in the regime of small deviations 
E — E^^yp^ ~ JN^^'^, but formally our method does not allow to infer the precise shape 
of the density in that regime. On the other hand, for vanishing magnetic field F = 
we readily see C{£) = |(— 2y)^/^ which matches the exponent in the tail of the Tracy- 
Widom distribution (22) if we set —2y = C/N"^^^. H In our language it can be seen 



as the consequence of Cc = e*^^ in this limit. To this end it is worth to mention that 
for r > the 3/2 power behaviour can be still seen in subleading terms of expansion 
around the threshold: 

£ = £c — z 

C{£) = f - log(2r + 1) - -^\ - , ^^"^ = + O (^3/2) (46) 

For small but finite F ^ 1 the large deviation function takes the following scaling 
behaviour 

,£ -£^y\ 



C{£) = T^F{ ^„ ), , F{x) = -{Vl^2i-l + {?,-2Vl^2i)x) (47) 

I ^ 6 

where the function F{x) is defined for x g] — oo, ^]. 

As was discussed in the first chapter the results of the perturbation theory suggest 
that in the regime F ~ A^~^/^ the probability of small deviations in the minimum energy 
from its typical value are expected to be given by a universal family of functions. Using 



(47) we therefore can predict the tail behaviour for the densities belonging to that 
family. Namely, we expect that for A^ — )■ cxo and £ = £^yp + 5N~'^/^, F = kN~^^^ 
the probability density V{£) tends to the function p{6, k) such that its tail for large 
negative 5 and large positive k ^ 1 has the form 



5 



p{6, k') oc e-'^^ ^^\ — = X < oo (4^ 



,2 



where the scaling function -F(x), defined in (47), is universal. As mentioned above, for 
X — 7- — oo one has F{x) ~ |(— 22;)^/^ so as to match with the tail of the Tracy-Widom 



distribution (22) for cr = (with 6 = -C,/2). 



4.2. Calculation of the pre- exponential factors 

To extract the leading pre-exponential factor in the present formalism we obviously 
must take into account the Gaussian fluctuations around the replica-symmetric saddle- 



point solution. Combining (34) and (24) we can write 



/g-ivse_„\ ji^ (Z") OC e^^^'^) lim detQ(-"-^)/' , ^ (49) 

\ / „=sT,T->0 n=sT,T^O ^/^^ 

+ for a = the same calculation can be easily extended to any T and the corresponding tail of the 
free energy distribution / (coming from the large deviation regime) is found to be ~ e~3^^~"^'' (^^i*) 
for r < 1 with / = f*yp + Jy, cf. 25J. It would be interesting to investigate how the small deviation 
distribution of / relates to the Tracy-Widom in the whole phase T < 1 
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since the factor CN,nN~^ ^ 1 in that hmit rj Here A is the n(n — l)/2 matrix of the 
quadratic form describing the fluctuations around the saddle point whose entries are 
given by 



A 



(ab){cd) 



dq(ab)dq[ 



cd) 



where a^h.,c^ d. For the replica-symmetric saddle-point (27) the matrix A has three 
distinct elements: 

A{ah){ah) = l^'^j'^-{Pd+P^) = ^l5 \ab){ac) = -{PdP+P^) = ^2, A(^ab){cd) = -2p^ = ^3(5!) 

The matrix of such structure was originally diagonalized in the course of the classical 
De-Almeida-Thouless stability analysis [2E], revealing the existence of three distinct 
eigenvalues Ai, A2, A3 given explicitly by 



Ai = Ai + 2(n - 2)^2 + ^ 'j Us 

A3 = Ai - 2^2 + A3 



A2 = Ai + (n - 4)^2 -in- 3)^3, 

(52) 



and the corresponding degeneracies are given by di = 1, ^2 
therefore see that 



n - 1, d^ 



lim — , 
"=^^'^^° V^detl 



lim 

n=sT,T-s>0 



n(ri — 3) 

A1A2 A3 



-1/2 



n(n— 3) 



/A, 



. We 



lim 

=sT,T->0 V Ai 



Substituting here (52), (51) and (27) and further exploiting the low-temperature Ansatz 
1 — vT we find after straightforward calculations the low-temperature behaviour 



Q 



Ai 



which implies 



A2 
Ai 



1 2v + s ( l_ 

' T^ v^v + s? ^ \t^ 



1/2 



A, 



T^O 



2i; + 2s 
2v + s 



1 2v + 2s r>( ^ \ ( 
■^^2(^ + ^)2+^1^) (53) 



(54) 



Similarly we have for the replica-symmetric Q— matrices using (|28|) 

(detg(-"-^)/^)^^ 



=sT,T^O 



V 



V + s 



(55) 



Combining all the factors together and using ( 38 ) we finally arrive at the full asymptotic 
large-deviation expression for the Laplace transform of the probability density for the 
minimum: 

2B 



-NsEmin,{^) 



) ~ 9{s) 



oN^{s) 



9[s) 



(Vs2+4S2(s + Vs2 + 452)) 



1/2 



(56) 



* we use that Ilfelo r((^- fc)/2) = G(^)G(1 + f)/{G{ ^-J,'+^ )G{l + ^)) in terms of the Barnes 
function G{x) 
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Note that g{Q) = 1 as required by normalisation. Now we can use (56) and (37) 



to recover the pre-exponential factor in the probabihty density V{E). Recalhng the 
relation ^'^ 



[s] = \ (a^s + ^Vs2 + 452) we first find 
(s + a25Vs2 + 4fi2) 



0"(.) 



252v^ 



452 



(57) 



and then using the relation between s^ and e (42) we further establish the identities: 
s.WB^^sI + 452 = 25' v^ ^ ^" ^^ ^ 



e2, s^+^Jsl + AB^ 



J2 + 2a2 



— e 



(58) 



where the threshold E^, = Nee was defined in (41 ). Combining all the formulas we arrive 



at our final asymptotic large-deviation result for the distribution of the minimum: 

1/2 



V{E) 



Et 



^NC{e=E/N) 



E<Er. 



(59) 



NirJ^^E^-Eli-E + y^W^El) 

which is one of the main results of our paper. 

Several comments are in order. First for any F > one can expand this formula 
for V{E) around the most probable value (p|, i.e. £ around Styp = — vT+T as in (45) 
and obtain: 

N 



V{E)dE = P{S)d£ 



T 



)V2e-f(^-W)^t/£: 



n 



(60) 



hence for any F > thanks to the prefactor it now reduces to a correctly normalized 
Gaussian distribution JV{E)dE = 1 in the regime of typical fluctuations. 



Next if we naively take the limit F = of (59) we find, for £^ < — 1: 

IM -NCo{£)AC 

\iu,P{£)d£ ' ^ ^^ 



(^2 _ l)i/V-^ + V^^"^ 



(61) 



Interestingly, the pre-exponential factor in (59) has precisely the same structure as the 



corresponding factor known from the independent non-trivial RMT calculations [131 E] • 
If we compare (for convenience) with the formula (16) of Ref. [IH], the variable denoted 
s there being s = —\f2E (using again the choice J' = 1/2), we find that the limit (61) 
is exactly twice the result (16) of Ref. [18] El Similarly we can check the tail, replacing 

in (59) i? — 7- — ^A^Amax = ~"%(1 + 2a^) ^'^^ finds to leading order in large N: 



V[E)dE 



. 2N^/^ 



i^/Vi^^^^rfE 



2^CI^ 



e 3C dC, 



(62) 



which is also exactly twice the tail formula (22) for the TW distribution (which verifies 
that the prefactor in (16) of Ref. [H] matches exactly the large argument limit of the 
TW law). 



of course, as noted above the exponent term is correct, i.e. Cq{£) = '0+(s) there. 
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This mismatch of an overall factor of 2 is puzzling at first, since we claim that a 
constant multiplicative factor could have been hardly missed in the calculation given 



the normalization property (60) noted above. After some thought one realizes that it is 
fixed r > which makes the above saddle-point fluctuation calculation fully controlled 
at large N . The subtlety then likely arises due to a non-commutativity of the limits 
r — > and A^ — 7- oo when the density V{E) ceases to be Gaussian in the vicinity of 
the most probable value. In that limit, i.e. strictly zero field F = first, the procedure 



(35) of inferring the pre-exponential factors in V{E) from its Laplace transform in the 
large-N limit should be reexamined, as it was based on assuming the analyticity of 
the function se + £(e) at the point of its minimum. A plausible scenario behind such 
a mismatch could be as follows. We have argued before that in the scaling regime 
r ~ A^~^/^ the probability density of the minimal energy in the small-deviation regime 
is given by a (presumably) universal family of densities parametrized by k = TN^^^, 
with the standard TW density recovered in the limit k = 0. If densities in the family 
contained a k— dependent multiplicative factor which changed smoothly between the 
values 1/4 for k = and 1/2 for k — t- oo (cf. the behaviour of the mean number of 
extrema in the same regime. Fig. 3), the limits F — )■ and A^ — ?■ oo would not commute 
in precisely the manner discussed above, explaining the observed mismatch. 

Note that the factors in the exponentials match perfectly well, hence this is only a 
subtlety involving the fluctuations around the saddle point. It is quite possible that the 
factor of 2 could, in the end, be accounted by a one-sided only saddle point integration, 
but the details are interesting and deserve to be further studied. 

5. Replica trick II: direct approach to the distribution of the ground state. 

Let us now present an alternative way to extract the probability density of the minimum 
energy £ = Eminih) / N J based on the identity: 



V{8) = hm Pf^iS) (63) 

/3— ^co 



where we have introduced 



P,(S) = {Si£-^)) = r°°*«/e--*fi 



NJ 

NJ 'If, y_oo 27r' 
with b[u) being the Dirac delta-function and 

(••), = ^/^xe^^'^(^) (64) 

standing for the thermal average performed with the Gibbs measure for a single given 
realization of the disorder. We can now use replica to express the disorder averages: 
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We apply the same steps as before, the only difference being that one particular replica, 
labelled as 1, is different from the rest of n — 1 ones, leading to: 

» n 

IW) = CN,n / det g(-"-i)/2 rr 6 {qaa - 1) rfge^*"(«) (65) 

Jq>o „^i 

with the new functional: 



27r 
where we have defined: 



l2 t2 I 92 fl " 72 

a=l 

using that qu = 1. Since the dependence in k is quadratic we can perform the Gaussian 
integral over k leading to our new functional: 

72 /o "■ t2 

^n{Q) = $n(Q) - J^^(^ + J E(y^l<^ + ^'^l'^))' 

a=l 

The saddle point equations read: 

a=l 

It is natural to look for a replica symmetric solution with the following structure, 
Qaa = 1, qib = Qbi = u, b = 2, ..n and qab = qba = Q ioi 1 < a < b. Introducing the 
inverse matrix with parameters Qii = po, Q^^ = pd for a > 2, Q^^ = Q^^ = u for 
6 > 2, Q'^^ = Q^^ = p for 6 > a > 2, we obtain the four equations: 

Po + {n- l)uu = 1 , u + u{pd + {n- 2)p) = 

upo + ^(1 + (n — 2)g) = , mm + P(i + (n — 2)gp = 1 (66) 

Leading to: 



v^ — q _ —u 

P= T, ^7^ — 7 ^^ 7 7V^ ' ^ 



2 



(l-g)(l + g(n-2)-(n-l)M2) ' i + g(n _ 2) - (n - 1)m 

^ 1 + g(n - 2) ^ 1 + (n - 3)g - (n - 2)m^ 

^° l + g(n-2)-(n-l)M2 ' ^'^ (i _ g)(i + g(n _ 2) - (n - 1)^2 ^""^^ 

This leads to the following saddle point equations in the limit n = 0: 
{u" - q)T^ 



:i-q){l-2q + u^] 



+ rq + (T' = 



T^U . 2 2J(J2m + C72) 1 J2 

+ J^M + a=^ = — ^^ — ——^{ST + -( — (1 - u^) + o-^{l - u)) (6^ 



l-2q + u'^ J2 + 2(t2 ' J' 2 
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We can solve these equations at low T inserting the following expansion: 

q = l-Tv + T'^w + 0{T^) , u = l-Tv + T\ + 0{T^) 
and we find w^ + 2£v = —£^ 



h _|_ or 
v = -£±^£^^£l , £c = -^Y^ 

r — w = 1 + £v (69) 

and we recall that F = cr^/J^. 

To calculate the functional at the saddle point we need to evaluate the Trln((5). 
The eigenvalues of Q are displayed in the Appendix of 



1 — g , with multiplicity d = n — 2 

/i± = -(2 + (n - 2)g ± ^/{n - 2fq^ + 4^2(77, - 1) (70) 

This leads to: 

det Q = (1 - g)""^(l + qin - 2) - u^(n - 1)) 
and also for n = 0: 

Trg2 = 2(g2 _ u^) 

Tr In (g) = -2 ln(l - g) + ln(l - 2g + u^) (71) 

^g,b = 2(g-n) (72) 

ah 

which then gives: 

vl/„(g) = - ln(l - g) + ^ ln(l - 2g + n^) + ^(g^ _ ,,2) ^ ^2^2^^ _ ^^ 

^ /.- , /^/"^ /I 2\ , 2/1 \\\2 



K^+^(^(l-^') + ^ll-^))) (73) 



J2 + 2ct2' J' 2 
Its zero temperature limit T = is found to be: 

hmvl/„(g) = (l + r)(^-r)-^ l__A^ + _ln(^ -^^ 



Choosing the — branch in (69) we recover the formula (43). More precisely, from 
(63), (65) and the definition of the large deviation rate function (32) : 

V{E) ~ e^*^(^) , L{E) = - lim ^n{Q) (74) 

Hence this more direct method to calculate the probability distribution gives an 
identical result to the more conventional method of the previous Section using the 
analytical continuation from integer moments via the replica saddle point. While the 
previous method used the scaling n = sT the present method works directly at n = 0. 
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6. Conclusions and Open Problems 

We have demonstrated that despite its deceptive simplicity the problem of describing 
statistics of the minima of a cost function given by the sum of a random quadratic and 
a random linear form in A^ real variables over {N — 1)— dimensional sphere has rather 
rich phenomenology, and generates quite a few open questions. The existence of two 
nontrivial scaling regimes is intimately connected with properties of random matrix 
spectra. Yet the standard RMT spectral methods and techniques, being very useful for 
the problem of counting various types of critical points in the cost function landscape, 
do not seem to be of obvious utility for extracting the statistics of minima beyond the 
perturbation theory. Thus, for getting explicit analytical insights into the statistical 
characteristics of the global minimum we had to resort to the powerful heuristic method 
of Statistical Mechanics, the replica trick. Note that the replica methods have recently 
allowed to unveil the convergence to Tracy Widom distributions of the free energy of 
directed polymers in random media and of the height field of the Kardar-Parisi-Zhang 
growth equation [311 ESI |33l [31], and it seems as an important goal to understand 
whether these approaches can extend to random matrices as well. We have indeed found 
that the large-deviation results extending those known in the random matrix theory can 
be successfully reproduced by replica. It of course remains an obvious challenge to find 
rigorous ways of confirming our large-deviations results, not mentioning extending these 
considerations to the level of (Tracy- Widom like) small deviations in the corresponding 
scaling regime as well as investigating the issue of universality. 

Even at the level of perturbation theory the problem touches on poorly explored 
RMT problems like parametric motion of extreme eigenvalues. In general, clarifying 
the RMT content of the quadratic eigenvalue problem in question, such as the gradual 
reduction of number of real solutions of the characteristic equation (pi), remains an 
interesting open task. It goes without saying that all the same questions can be asked, 
(and to the extent covered in the paper, answered) for complex quadratic and linear 
forms, with GUE matrices H replacing the GOE ones. Completely open is the question 
of investigating all aspects of the same problem for quadratic forms based on non- 
invariant ensembles of random matrices, such as various matrices with i.i.d. entries 
(Wigner, sparse, banded, etc.). 

Finally, it is natural to expect that the zero-temperature gradient descent dynamics 
(or, more generally, Langevin dynamics with a noise simulating finite temperatures) 
should also reflect the existence of the two scaling regimes of the small magnetic field 
revealed by our considerations. 
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